Tensor Decompositions for Very Large Scale Problems
نویسندگان
چکیده
Modern applications such as neuroscience, text mining, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionality. Tensors (i.e., multi-way arrays) provide a natural representation for such massive data. Consequently, tensor decompositions and factorizations are emerging as novel and promising tools for exploratory analysis of multidimensional data in diverse disciplines including social networks analysis. Tensor decompositions, especially DEDICOM, PARAFAC and TUCKER models are important tools for exploration of social networks by capturing multi-linear and multi-aspects structures in massive higher-order data-sets. However, most of the existing algorithms for tensor decompositions and factorizations are not suitable for large-scale problems since usually memory overflows occur during the tensor decomposition process [1], [2]. In this paper we propose novel algorithms for 3D nonnegative DEDICOM and TUCKER models which are suitable for very large scale problems. In fact, our algorithms allow us to work with much larger dense tensors that were too big to handle before. In our approach, we first decompose three-way data into Tucker-3 or Tucker-2, and next we retrieve the desired DEDICOM matrices from the Tucker factors and core tensor. For very large-scale problems we developed novel divide and conquer procedure which splits a data tensor to sub-tensors and next performs TUCKER-3 for the each sub-tensor in parallel way, and next reassemble the results to estimate desired factor (component) matrices of the DEDICOM model. We illustrate validity and high performance of the proposed algorithms by two illustrative examples.
منابع مشابه
Tensor Networks for Big Data Analytics and Large-Scale Optimization Problems
Tensor decompositions and tensor networks are emerging and promising tools for data analysis and data mining. In this paper we review basic and emerging models and associated algorithms for large-scale tensor networks, especially Tensor Train (TT) decompositions using novel mathematical and graphical representations. We discus the concept of tensorization (i.e., creating very high-order tensors...
متن کاملEstimating a Few Extreme Singular Values and Vectors for Large-Scale Matrices in Tensor Train Format
We propose new algorithms for singular value decomposition (SVD) of very large-scale matrices based on a low-rank tensor approximation technique called the tensor train (TT) format. The proposed algorithms can compute several dominant singular values and corresponding singular vectors for large-scale structured matrices given in a TT format. The computational complexity of the proposed methods ...
متن کاملTensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions
Machine learning and data mining algorithms are becoming increasingly important in analyzing large volume, multi-relational and multi– modal datasets, which are often conveniently represented as multiway arrays or tensors. It is therefore timely and valuable for the multidisciplinary research community to review tensor decompositions and tensor networks as emerging tools for large-scale data an...
متن کاملParCube: Sparse Parallelizable Tensor Decompositions
How can we efficiently decompose a tensor into sparse factors, when the data does not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data mining applications, however the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose ParCube, a new and highly parallelizable method for ...
متن کاملA PARCUBE: Sparse Parallelizable CANDECOMP-PARAFAC Tensor Decomposition
How can we efficiently decompose a tensor into sparse factors, when the data does not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data mining applications, however the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose PARCUBE, a new and highly parallelizable method for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010